Zhigang Lu
18/06/2019
We have PacBio Iso-Seq reads from male, female and somule. Reads were merged from different stages.
Raw reads –> Filter reads –> Merge overlapped reads –> Intersect with gene models
Might missing potential merging information but can refer back.
To remove:
Smp_000020,Smp_000030 2 Smp_000020 24 Smp_000030 157
Smp_000110,Smp_000130 1 Smp_000110 10 Smp_000130 48
To keep:
Smp_000700,Smp_000710 10 Smp_000700 4 Smp_000710 78
Smp_004980,Smp_213790 7 Smp_004980 0 Smp_213790 0
Smp_012400,Smp_012410 4 Smp_012400 0 Smp_012410 22
In total Iso-Seq reads mapped to 7121 genes
| gene | source | gstart | glength | gstrand | read | rstart | rlength | rstdev | diff |
|---|---|---|---|---|---|---|---|---|---|
| Smp_000020 | AUGUSTUS | 46644182 | 28795 | + | plus-SM_V7_1#486 | 46644174 | 28796 | 2977.250 | -8 |
| Smp_000030 | AUGUSTUS | 46620712 | 15557 | + | plus-SM_V7_1#485 | 46614735 | 16467 | 1511.700 | -5977 |
| Smp_000040 | Apollo | 46610461 | 19987 | - | minus-SM_V7_1#450 | 46610433 | 19960 | 1725.950 | 28 |
| Smp_000050 | Apollo | 46565658 | 95034 | - | minus-SM_V7_1#449 | 46556314 | 85485 | 25912.000 | 9344 |
| Smp_000070 | Apollo | 46461772 | 3444 | + | plus-SM_V7_1#484 | 46461142 | 4098 | 297.719 | -630 |
rstdev is the standard deviation from start coordnates of merged Iso-Seq reads.
summary(geneiso$diff)
Min. 1st Qu. Median Mean 3rd Qu. Max.
-71482.0 -21.0 1.0 437.8 98.0 96381.0
brakes: -1000 -500 -200 -100 0 100 200 500 1000 counts: 93 172 229 2544 1850 372 291 159
Data from Roquis et al 2015 PLoS NTD “The Epigenome of Schistosoma mansoni Provides Insight about How Cercariae Poise Transcription until Infection”
For H3K4me3, we found relatively sharp peaks with a mean peak maximum located 250 bp (or 1–2 nucleosomes) downstream of the transcription start site (TSS) of genes (Roquis et al 2015)
37.7% in downstream 500 bp and 54% in downstream 750 bp.
If defined a correct model as:
| gene | isoseq-gene | chipseq-gene | read | chipseq-isoseq |
|---|---|---|---|---|
| Smp_000020 | 8 | 169 | plus-SM_V7_1#486 | 177 |
| Smp_000420 | 27 | NA | plus-SM_V7_3#1745 | NA |
| Smp_000690 | NA | 394 | NoIsoseq | NA |
| Smp_000050 | 9344 | 9386 | minus-SM_V7_1#449 | 42 |
| Smp_000070 | 630 | 85 | plus-SM_V7_1#484 | 235 |
| Smp_000075 | NA | NA | NoIsoseq | NA |
| Smp_000200 | 1587 | 1718 | plus-SM_V7_3#1757 | 3305 |
NA: no data available. Gene might not be transcribed at that stage.
| gene | isoseq-gene | chipseq-gene | read | chipseq-isoseq | group |
|---|---|---|---|---|---|
| Smp_000020 | 8 | 169 | plus-SM_V7_1#486 | 177 | A |
| Smp_000420 | 27 | NA | plus-SM_V7_3#1745 | NA | B |
| Smp_000690 | NA | 394 | NoIsoseq | NA | B |
| Smp_000050 | 9344 | 9386 | minus-SM_V7_1#449 | 42 | C |
| Smp_000070 | 630 | 85 | plus-SM_V7_1#484 | 235 | C |
| Smp_000075 | NA | NA | NoIsoseq | NA | D |
| Smp_000200 | 1587 | 1718 | plus-SM_V7_3#1757 | 3305 | D |
Promoter region (-200 to +5) for genes with accurate TSS (Iso-Seq diff <= 5bp and ChIP-Seq within 250bp; 640 genes)
Top motifs using MEME
(Better to use genes with similar functions; can also use discovered motif to refer binding sites of similar genes with not enough evidence.)
Promoter region for genes with wrong TSS (Isoseq diff > 500 & Chipseq diff > 500 & Isoseq-Chipseq < 250bp; 464 genes)
Top motifs:
(Possible that some TSSs are correct but incorrectly grouped?)